Skip to content
This repository has been archived by the owner on Jan 20, 2024. It is now read-only.

Assignment probabilities based on segment sizes #6

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

shaycrk
Copy link

@shaycrk shaycrk commented Feb 21, 2014

We've been using this tool to create segments for email campaigns, but have run into an issue when using it to generate segments of much different sizes.

The current code is looping over parent campaign members in the order they come out of the underlying datastore and attempting to assign to each of the groups with equal probability. In practice, this means small groups fill up first (presumably with older contacts who have lower primary keys) and larger groups end up over-representing contacts who are assigned later (presumably newer contacts with larger primary keys). Additionally, this has a performance impact by continuing to attempt to assign members to the small segments even after they've filled up, resulting in many recursive calls to assignMember().

This patch assigns a probability to each segment based on its relative size and then makes assignments based on those probabilities. As a result, members are added to smaller segments at a slower rate than to the larger segments, providing a more even distribution of assignments relative to the initial ordering of the contacts and fewer recursive calls.

Patch to assign members to each list with a probability based on the relative size of each list to generate. This fixes a bug in which members were assigned to lists with equal probability (until each list hit is specified size), causing small lists to get filled up with older members (lower primary keys) and larger lists to over-represent newer members.
Added missing semicolons.
Save one loop through the sizes array by generating the CDF directly, rather than creating a PDF first.
bug fixes
Comment out system.debug() statement
The earlier commit won't cover batched loading for large lists, so updating that file with the same size-based population.
Generate the CDF when setting up the batch loader for large segments.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant